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Studies  of  the  human  microbiome  have  revealed  that  even  healthy  individuals  differ  remarkably  in  the  microbes  that 
occupy  habitats  such  as  the  gut,  skin  and  vagina.  Much  of  this  diversity  remains  unexplained,  although  diet, 
environment,  host  genetics  and  early  microbial  exposure  have  all  been  implicated.  Accordingly,  to  characterize  the 
ecology  of  human -associated  microbial  communities,  the  Human  Microbiome  Project  has  analysed  the  largest  cohort 
and  set  of  distinct,  clinically  relevant  body  habitats  so  far.  We  found  the  diversity  and  abundance  of  each  habitat’s 
signature  microbes  to  vary  widely  even  among  healthy  subjects,  with  strong  niche  specialization  both  within  and  among 
individuals.  The  project  encountered  an  estimated  81-99%  of  the  genera,  enzyme  families  and  community 
configurations  occupied  by  the  healthy  Western  microbiome.  Metagenomic  carriage  of  metabolic  pathways  was 
stable  among  individuals  despite  variation  in  community  structure,  and  ethnic /racial  background  proved  to  be  one  of 
the  strongest  associations  of  both  pathways  and  microbes  with  clinical  metadata.  These  results  thus  delineate  the  range 
of  structural  and  functional  configurations  normal  in  the  microbial  communities  of  a  healthy  population,  enabling  future 
characterization  of  the  epidemiology,  ecology  and  translational  applications  of  the  human  microbiome. 


A  total  of  4,788  specimens  from  242  screened  and  phenotyped  adults1 
(129  males,  113  females)  were  available  for  this  study,  representing  the 
majority  of  the  target  Human  Microbiome  Project  (HMP)  cohort  of 
300  individuals.  Adult  subjects  lacking  evidence  of  disease  were 
recruited  based  on  a  lengthy  list  of  exclusion  criteria;  we  will  refer 
to  them  here  as  ‘healthy’,  as  defined  by  the  consortium  clinical 
sampling  criteria  (K.  Aagaard  et  al. ,  manuscript  submitted). 
Women  were  sampled  at  18  body  habitats,  men  at  15  (excluding  three 
vaginal  sites),  distributed  among  five  major  body  areas.  Nine  specimens 
were  collected  from  the  oral  cavity  and  oropharynx:  saliva;  buccal 
mucosa  (cheek),  keratinized  gingiva  (gums),  palate,  tonsils,  throat 
and  tongue  soft  tissues,  and  supra-  and  subgingival  dental  plaque  (tooth 
biofilm  above  and  below  the  gum).  Four  skin  specimens  were  collected 
from  the  two  retroauricular  creases  (behind  each  ear)  and  the  two 
antecubital  fossae  (inner  elbows),  and  one  specimen  for  the  anterior 
nares  (nostrils).  A  self-collected  stool  specimen  represented  the  micro¬ 
biota  of  the  lower  gastrointestinal  tract,  and  three  vaginal  specimens 
were  collected  from  the  vaginal  introitus,  midpoint  and  posterior 
fornix.  To  evaluate  within-subject  stability  of  the  microbiome,  131 
individuals  in  these  data  were  sampled  at  an  additional  time  point 
(mean  219  days  and  s.d.  69  days  after  first  sampling,  range  35-404  days). 
After  quality  control,  these  specimens  were  used  for  16S  rRNA  gene 
analysis  via  454  pyrosequencing  (abbreviated  henceforth  as  16S  profil¬ 
ing,  mean  5,408  and  s.d.  4,605  filtered  sequences  per  sample);  to  assess 
function,  681  samples  were  sequenced  using  paired-end  Illumina 
shotgun  metagenomic  reads  (mean  2.9gigabases  (Gb)  and  s.d.  2.1  Gb 
per  sample)1.  More  details  on  data  generation  are  provided  in  related 
HMP  publications1  and  in  Supplementary  Methods. 

Microbial  diversity  of  healthy  humans 

The  diversity  of  microbes  within  a  given  body  habitat  can  be  defined  as 
the  number  and  abundance  distribution  of  distinct  types  of  organisms, 
which  has  been  linked  to  several  human  diseases:  low  diversity  in  the 
gut  to  obesity  and  inflammatory  bowel  disease2'3,  for  example,  and  high 
diversity  in  the  vagina  to  bacterial  vaginosis4.  For  this  large  study 


involving  microbiome  samples  collected  from  healthy  volunteers  at 
two  distinct  geographic  locations  in  the  United  States,  we  have  defined 
the  microbial  communities  at  each  body  habitat,  encountering  8 1  -99% 
of  predicted  genera  and  saturating  the  range  of  overall  community 
configurations  (Fig.  1,  Supplementary  Fig.  1  and  Supplementary 
Table  1;  see  also  Fig.  4).  Oral  and  stool  communities  were  especially 
diverse  in  terms  of  community  membership,  expanding  prior  observa¬ 
tions5,  and  vaginal  sites  harboured  particularly  simple  communities 
(Fig.  la).  This  study  established  that  these  patterns  of  alpha  diversity 
(within  samples)  differed  markedly  from  comparisons  between 
samples  from  the  same  habitat  among  subjects  (beta  diversity. 
Fig.  lb).  For  example,  the  saliva  had  among  the  highest  median  alpha 
diversities  of  operational  taxonomic  units  (OTUs,  roughly  species  level 
classification,  see  http://hmpdacc.org/HMQCP),  but  one  of  the  lowest 
beta  diversities — so  although  each  individual’s  saliva  was  ecologically 
rich,  members  of  the  population  shared  similar  organisms.  Conversely, 
the  antecubital  fossae  (skin)  had  the  highest  beta  diversity  but  were 
intermediate  in  alpha  diversity.  The  vagina  had  the  lowest  alpha  diversity, 
with  quite  low  beta  diversity  at  the  genus  level  but  very  high  among 
OTUs  due  to  the  presence  of  distinct  Lactobacillus  spp.  (Fig.  lb).  The 
primary  patterns  of  variation  in  community  structure  followed  the 
major  body  habitat  groups  (oral,  skin,  gut  and  vaginal),  defining  as  a 
result  the  complete  range  of  population- wide  between- subject  variation 
in  human  microbiome  habitats  (Fig.  lc).  Within-subject  variation  over 
time  was  consistently  lower  than  between-subject  variation,  both  in 
organismal  composition  and  in  metabolic  function  (Fig.  Id).  The 
uniqueness  of  each  individual’s  microbial  community  thus  seems  to 
be  stable  over  time  (relative  to  the  population  as  a  whole),  which  may  be 
another  feature  of  the  human  microbiome  specifically  associated  with 
health. 

No  taxa  were  observed  to  be  universally  present  among  all  body 
habitats  and  individuals  at  the  sequencing  depth  employed  here, 
unlike  several  pathways  (Fig.  2  and  Supplementary  Fig.  2,  see  below), 
although  several  clades  demonstrated  broad  prevalence  and  relatively 
abundant  carriage  patterns6,7.  Instead,  as  suggested  by  individually 


*Lists  of  participants  and  their  affiliations  appear  at  the  end  of  the  paper. 


14  JUNE  2012  |  VOL  486  |  NATURE  I  207 


©2012  Macmillan  Publishers  Limited.  All  rights  reserved 


RESEARCH 


ARTICLE 


a 


d j 
o 


b 


>. 

'I 

0 

> 

A 

ra 

o 

_Q 

0 

> 

<6 

cn 

o 


Within-sample  alpha  diversity  C 


Gastrointestinal 


Nasal 


Skin 


Between-sample  beta  diversity 


Figure  1  |  Diversity  of  the  human  microbiome  is  concordant  among 
measures,  unique  to  each  individual,  and  strongly  determined  by  microbial 
habitat,  a,  Alpha  diversity  within  subjects  by  body  habitat,  grouped  by  area,  as 
measured  using  the  relative  inverse  Simpson  index  of  genus-level  phylotypes 
(cyan),  16S  rRNA  gene  OTUs  (blue),  shotgun  metagenomic  reads  matched  to 
reference  genomes  (orange),  functional  modules  (dark  orange),  and  enzyme 
families  (yellow).  The  mouth  generally  shows  high  within-subject  diversity  and 
the  vagina  low  diversity,  with  other  habitats  intermediate;  variation  among 
individuals  often  exceeds  variation  among  body  habitats,  b,  Bray-Curtis  beta 
diversity  among  subjects  by  body  habitat,  colours  as  for  a.  Skin  differs  most 
between  subjects,  with  oral  habitats  and  vaginal  genera  more  stable.  Although 


alpha-  and  beta- diversity  are  not  directly  comparable,  changes  in  structure 
among  communities  (a)  occupy  a  wider  dynamic  range  than  do  changes  within 
communities  among  individuals  (b).  c,  Principal  coordinates  plot  showing 
variation  among  samples  demonstrates  that  primary  clustering  is  by  body  area, 
with  the  oral,  gastrointestinal,  skin  and  urogenital  habitats  separate;  the  nares 
habitat  bridges  oral  and  skin  habitats,  d,  Repeated  samples  from  the  same 
subject  (blue)  are  more  similar  than  microbiomes  from  different  subjects  (red). 
Technical  replicates  (grey)  are  in  turn  more  similar;  these  patterns  are 
consistent  for  all  body  habitats  and  for  both  phylogenetic  and  metabolic 
community  composition.  See  previously  described  sample  counts1  for  all 
comparisons. 


focused  studies2-3,5’8,9,  each  body  habitat  in  almost  every  subject  was 
characterized  by  one  or  a  few  signature  taxa  making  up  the  plurality  of 
the  community  (Fig.  3).  Signature  clades  at  the  genus  level  formed  on 
average  anywhere  from  17%  to  84%  of  their  respective  body  habitats, 
completely  absent  in  some  communities  (0%  at  this  level  of  detection) 
and  representing  the  entire  population  (100%)  in  others.  Notably,  less 
dominant  taxa  were  also  highly  personalized,  both  among  individuals 
and  body  habitats;  in  the  oral  cavity,  for  example,  most  habitats  are 
dominated  by  Streptococcus,  but  these  are  followed  in  abundance  by 
Haemophilus  in  the  buccal  mucosa,  Actinomyces  in  the  supragingival 
plaque,  and  Prevotella  in  the  immediately  adjacent  (but  low  oxygen) 
subgingival  plaque10. 

Additional  taxonomic  detail  of  the  human  microbiome  was  pro¬ 
vided  by  identifying  unique  marker  sequences  in  metagenomic  data11 
(Fig.  3a)  to  complement  16S  profiling  (Fig.  3b).  These  two  profiles 
were  typically  in  close  agreement  (Supplementary  Fig.  3),  with  the 
former  in  some  cases  offering  more  specific  information  on  members 
of  signature  genera  differentially  present  within  habitats  (for  example, 
vaginal  Prevotella  amnii  and  gut  Prevotella  copri)  or  among  indivi¬ 
duals  (for  example,  vaginal  Lactobacillus  spp.)  One  application  of  this 
specificity  was  to  confirm  the  absence  of  NIAID  (National  Institute  of 


Allergy  and  Infectious  Diseases)  class  A-C  pathogens  above  0.1% 
abundance  (aside  from  Staphylococcus  aureus  and  Escherichia  coli) 
from  the  healthy  microbiome,  but  the  near-ubiquity  and  broad  dis¬ 
tribution  of  opportunistic  ‘pathogens’  as  defined  by  PATRIC12. 
Canonical  pathogens  including  Vibrio  cholerae,  Mycobacterium 
avium,  Campylobacter  jejuni  and  Salmonella  enterica  were  not 
detected  at  this  level  of  sensitivity.  Helicobacter  pylori  was  found  in 
only  two  stool  samples,  both  at  <0.01%,  and  E.  coli  was  present  at 
>0.1%  abundance  in  15%  of  stool  microbiomes  (>0%  abundance  in 
61%).  Similar  species-level  observations  were  obtained  for  a  small 
subset  of  stool  samples  with  454  pyrosequencing  metagenomics  data 
using  PhylOTU13,14.  In  total  56  of  327  PATRIC  pathogens  were 
detected  in  the  healthy  microbiome  (at  >1%  prevalence  of  >0.1% 
abundance,  Supplementary  Table  2),  all  opportunistic  and,  strikingly, 
typically  prevalent  both  among  hosts  and  habitats.  The  latter  is  in 
contrast  to  many  of  the  most  abundant  signature  taxa,  which  were 
usually  more  habitat- specific  and  variable  among  hosts  (Fig.  3a,  b). 
This  overall  absence  of  particularly  detrimental  microbes  supports  the 
hypothesis  that  even  given  this  cohort’s  high  diversity,  the  microbiota 
tend  to  occupy  a  range  of  configurations  in  health  distinct  from  many 
of  the  disease  perturbations  studied  to  date3,15. 
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Figure  2  |  Carriage  of  microbial  taxa  varies  while  metabolic  pathways 
remain  stable  within  a  healthy  population,  a,  b,  Vertical  bars  represent 
microbiome  samples  by  body  habitat  in  the  seven  locations  with  both  shotgun 
and  16S  data;  bars  indicate  relative  abundances  colored  by  microbial  phyla 
from  binned  OTUs  (a)  and  metabolic  modules  (b).  Legend  indicates  most 
abundant  phyla/pathways  by  average  within  one  or  more  body  habitats;  RC, 


retroauricular  crease.  A  plurality  of  most  communities’  memberships  consists 
of  a  single  dominant  phylum  (and  often  genus;  see  Supplementary  Fig.  2),  but 
this  is  universal  neither  to  all  body  habitats  nor  to  all  individuals.  Conversely, 
most  metabolic  pathways  are  evenly  distributed  and  prevalent  across  both 
individuals  and  body  habitats. 


Mean  non-zero  abundance  (size)  and  population  prevalence  (intensity)  of  microbial  clades 
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Figure  3  |  Abundant  taxa  in  the  human  microbiome  that  have  been 
metagenomically  and  taxonomically  well  defined  in  the  HMP  population. 

a-c,  Prevalence  (intensity,  colour  denoting  phylum/class)  and  abundance  when 
present  (size)  of  clades  in  the  healthy  microbiome.  The  most  abundant 
metagenomically-identified  species  (a),  16S-identified  genera  (b)  and 
PATRIC12  pathogens  (metagenomic)  (c)  are  shown,  d,  e,  The  population  size 


and  sequencing  depths  of  the  HMP  have  well  defined  the  microbiome  at  all 
assayed  body  sites,  as  assessed  by  saturation  of  added  community  metabolic 
configurations  (rarefaction  of  minimum  Bray-Curtis  beta- diversity  of 
metagenomic  enzyme  class  abundances  to  nearest  neighbour,  inter-quartile 
range  over  100  samples)  (d)  and  phylogenetic  configurations  (minimum  16S 
OTU  weighted  UniFrac  distance  to  nearest  neighbour)  (e). 
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Carriage  of  specific  microbes 

Inter-individual  variation  in  the  microbiome  proved  to  be  specific, 
functionally  relevant  and  personalized.  One  example  of  this  is  illu¬ 
strated  by  the  Streptococcus  spp.  of  the  oral  cavity.  The  genus  dominates 
the  oropharynx16,  with  different  species  abundant  within  each  sampled 
body  habitat  (see  http://hmpdacc.org/HMSMCP)  and,  even  at  the 
species  level,  marked  differences  in  carriage  within  each  habitat  among 
individuals  (Fig.  4a).  As  the  ratio  of  pan-  to  core-genomes  is  high  in 
many  human-associated  microbes17,  this  variation  in  abundance  could 
be  due  to  selective  pressures  acting  on  pathways  differentially  present 
among  Streptococcus  species  or  strains  (Fig.  4b).  Indeed,  we  observed 
extensive  strain-level  genomic  variation  within  microbial  species  in 
this  population,  enriched  for  host-specific  structural  variants  around 
genomic  islands  (Fig.  4c).  Even  with  respect  to  the  single  Streptococcus 
mitis  strain  B6,  gene  losses  associated  with  these  events  were  common, 


for  example  differentially  eliminating  S.  mitis  carriage  of  the  V-type 
ATPase  or  choline  binding  proteins  cbp6  and  cbpl2  among  subsets  of 
the  host  population  (Fig.  4d).  These  losses  were  easily  observable  by 
comparison  to  reference  isolate  genomes,  and  these  initial  findings 
indicate  that  microbial  strain-  and  host-specific  gene  gains  and 
polymorphisms  may  be  similarly  ubiquitous. 

Other  examples  of  functionally  relevant  inter- individual  variation 
at  the  species  and  strain  levels  occurred  throughout  the  microbiome. 
In  the  gut,  Bacteroides  fragilis  has  been  shown  to  prime  T-cell 
responses  in  animal  models  via  the  capsular  polysaccharide  A18, 
and  in  the  HMP  stool  samples  this  taxon  was  carried  at  a  level  of  at 
least  0.1%  in  16%  of  samples  (over  1%  abundance  in  3%).  Bacteroides 
thetaiotaomicron  has  been  studied  for  its  effect  on  host  gastrointestinal 
metabolism19  and  was  likewise  common  at  46%  prevalence.  On  the  skin, 
S.  aureus,  of  particular  interest  as  the  cause  of  methicillin-resistant 
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Figure  4  |  Microbial  carriage  varies  between  subjects  down  to  the  species 
and  strain  level.  Metagenomic  reads  from  127  tongue  samples  spanning  90 
subjects  were  processed  with  MetaPhlAn  to  determine  relative  abundances  for 
each  species,  a,  Relative  abundances  of  1 1  distinct  Streptococcus  spp.  In  addition 
to  variation  in  broader  clades  (see  Fig.  2),  individual  species  within  a  single 
habitat  demonstrate  a  wide  range  of  compositional  variation.  Inset  illustrates 
average  tongue  sample  composition,  b,  Metabolic  modules  present/absent 
(grey/white)  in  KEGG24  reference  genomes  of  tongue  streptococci  denote 
selected  areas  of  strain- specific  functional  differentiation,  cpnt,  component. 


c,  Comparative  genomic  coverage  for  the  single  Streptococcus  mitis  B6  strain. 
Grey  dots  are  median  reads  per  kilobase  per  million  reads  (RPKM)  for  1-kb 
windows,  grey  bars  are  the  25th  to  75th  percentiles  across  all  samples,  red  line 
the  LOWESS-smoothed  average.  Red  bars  at  the  bottom  highlight  predicted 
genomic  islands27.  Large,  discrete,  and  highly  variable  islands  are  commonly 
under-represented,  d,  Two  islands  are  highlighted,  V  (V-type  H+  ATPase 
subunits  I,  K,  E,  C,  F,  A  and  B)  and  CH  (choline -binding  proteins  cbp6  and 
cbpl2),  indicating  functional  cohesion  of  strain- specific  gene  loss  within 
individual  human  hosts. 
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S.  aureus  (MRSA)  infections,  had  29%  nasal  and  4%  skin  carriage 
rates,  roughly  as  expected20.  Close  phylogenetic  relatives  such  as 
Staphylococcus  epidermidis  (itself  considered  commensal)  were,  in 
contrast,  universal  on  the  skin  and  present  in  93%  of  nares  samples, 
and  at  the  opposite  extreme  Pseudomonas  aeruginosa  (a  representative 
Gram- negative  skin  pathogen)  was  completely  absent  from  both  body 
habitats  (0%  at  this  level  of  detection) .  These  and  the  data  above  suggest 
that  the  carriage  pattern  of  some  species  in  the  human  microbiome 
may  be  analogous  to  genetic  traits,  where  recessive  alleles  of  modest 
risk  are  maintained  in  a  population.  In  the  case  of  the  human  micro¬ 
biome,  high-risk  pathogens  remain  absent,  whereas  species  that  pose  a 
modest  degree  of  risk  also  seem  to  be  stably  maintained  in  this 
ecological  niche. 

Finally,  microorganisms  within  and  among  body  habitats  exhibited 
relationships  suggestive  of  driving  physical  factors  such  as  oxygen, 
moisture  and  pH,  host  immunological  factors,  and  microbial  inter¬ 
actions  such  as  mutualism  or  competition21  (Supplementary  Fig.  4). 
Both  overall  community  similarity  and  microbial  co-occurrence  and 
co-exclusion  across  the  human  microbiome  grouped  the  18  body 
habitats  together  into  four  clusters  corresponding  to  the  five  target 
body  areas  (Supplementary  Fig.  4a,  b).  There  was  little  distinction 
among  different  vaginal  sites,  with  Lactobacillus  spp.  dominating  all 
three  and  correlating  in  abundance.  However,  Lactobacillus  varied 
inversely  with  the  Actinobacteria  and  Bacteroidetes  (see  Supplemen¬ 
tary  Fig.  4c  and  Figs  2  and  3),  as  also  observed  in  a  previous  cohort9. 
Gut  microbiota  relationships  primarily  comprised  inverse  associa¬ 
tions  with  the  Bacteroides ,  which  ranged  from  dominant  in  some 
subjects  to  a  minority  in  others  who  carried  a  greater  diversity  of 
Firmicutes.  A  similar  progression  was  evident  in  the  skin  communities, 
dominated  by  one  of  Staphylococcus  (phylum  Firmicutes), 
Propionibacterium,  or  Corynebacterium  (both  phylum  Actinobacteria), 
with  a  continuum  of  oral  organisms  (for  example,  Streptococcus)  appear¬ 
ing  in  nares  communities  (Supplementary  Fig.  4c).  These  observations 
suggest  that  microbial  community  structure  in  these  individuals 
may  sometimes  occupy  discrete  configurations  and  under  other 
circumstances  vary  continuously,  a  topic  addressed  in  more  detail  by 
several  HMP  investigations  (ref.  6  and  unpublished  results).  An 
individual’s  location  within  such  configurations  is  indicative  of  current 
microbial  carriage  (including  pathogens)  and  of  the  community’s 
ability  to  resist  future  pathogen  acquisition  or  dysbiosis;  it  may  thus 
prove  to  be  associated  with  disease  susceptibility  or  other  phenotypic 
characteristics. 


Microbiome  metabolism  and  function 

As  the  first  study  to  include  both  marker  gene  and  metagenomic  data 
across  body  habitats  from  a  large  human  population,  we  additionally 
assessed  the  ecology  of  microbial  metabolic  and  functional  pathways 
in  these  communities.  We  reconstructed  the  relative  abundances  of 
pathways  in  community  metagenomes22,  which  were  much  more 
constant  and  evenly  diverse  than  were  organismal  abundances 
(Fig.  2b,  see  also  Fig.  1),  confirming  this  as  an  ecological  property  of 
the  entire  human  microbiome2.  We  were  likewise  able  to  determine 
for  the  first  time  that  taxonomic  and  functional  alpha  diversity  across 
microbial  communities  significantly  correlate  (Spearman  of  inverse 
Simpson’s  r  =  0.60,  P  =  3.6  X  10  67 ,  n  =  661),  the  latter  within  a 
more  proscribed  range  of  community  configurations  (Supplemen¬ 
tary  Fig.  5). 

Unlike  microbial  taxa,  several  pathways  were  ubiquitous  among 
individuals  and  body  habitats.  The  most  abundant  of  these  ‘core’ 
pathways  include  the  ribosome  and  translational  machinery,  nucleo¬ 
tide  charging  and  ATP  synthesis,  and  glycolysis,  and  reflect  the  basics 
of  host-associated  microbial  life.  Also  in  contrast  to  taxa,  few  path¬ 
ways  were  highly  variable  among  subjects  within  any  body  habitat; 
exceptions  included  the  Sec  (orally,  pathway  relative  abundance 
s.d.  =  0.0052;  total  mean  of  oral  standard  deviations  =  0.0011  with 
s.d.  =  0.0016)  and  Tat  (globally,  pathway  s.d.  =  0.0055;  mean  of 


global  standard  deviations  =  0.0023  with  s.d.  =  0.0033)  secretion  sys¬ 
tems,  indicating  a  high  degree  of  host-microbe  and  microbe-microbe 
interactions  in  the  healthy  human  microbiota.  This  high  variability 
was  particularly  present  in  the  oral  cavity;  for  phosphate,  mono-  and 
di-saccharide,  and  amino  acid  transport  in  the  mucosa;  and  also  for 
lipopolysaccharide  biosynthesis  and  spermidine/putrescine  synthesis 
and  transport  on  the  plaque  and  tongue  (http://hmpdacc.org/ 
HMMRC).  The  stability  and  high  metagenomic  abundance  of  this 
housekeeping  ‘core’  contrasts  with  the  greater  variability  and  lower 
abundance  of  niche-specific  functionality  in  rare  but  consistently 
present  pathways;  for  example,  spermidine  biosynthesis,  methionine 
degradation  and  hydrogen  sulphide  production,  all  examples  highly 
prevalent  in  gastrointestinal  body  sites  (non-zero  in  >92%  of 
samples)  but  at  very  low  abundance  (median  relative  abundance 
<  0.0052).  This  ‘long  tail’  of  low-abundance  genes  and  pathways  also 
probably  encodes  much  of  the  uncharacterized  biomolecular  function 
and  metabolism  of  these  metagenomes,  the  expression  levels  of  which 
remain  to  be  explored  in  future  metatranscriptomic  studies. 

Protein  families  showed  diversity  and  prevalence  trends  similar  to 
those  of  full  pathways,  ranging  from  maxima  of  only  ~  16,000  unique 
families  per  community  in  the  vagina  to  almost  400,000  in  the  oral 
cavity  (Fig.  la,  b;  http://hmpdacc.org/HMGI).  A  remarkable  fraction 
of  these  families  were  indeed  functionally  uncharacterized,  including 
those  detected  by  read  mapping,  with  a  minimum  in  the  oral  cavity 
(mean  58%  s.d.  6.8%)  and  maximum  in  the  nares  (mean  77%  s.d. 
11%).  Likewise,  many  genes  annotated  from  assemblies  could  not 
be  assigned  a  metabolic  function,  with  a  minimum  in  the  vagina 
(mean  78%  s.d.  3.4%)  and  maximum  in  the  gut  (mean  86%  s.d. 
0.9%).  The  latter  range  did  not  differ  substantially  by  body  habitat 
and  is  in  close  agreement  with  previous  comprehensive  gene  catalogues 
of  the  gut  metagenome3.  Taken  together  with  the  microbial  variation 
observed  above  throughout  the  human  microbiome,  functional  vari¬ 
ation  among  individuals  might  indicate  pathways  of  particular  import¬ 
ance  in  maintaining  community  structure  in  the  face  of  personalized 
immune,  environmental  or  dietary  exposures  among  these  subjects. 
Determining  the  functions  of  uncharacterized  core  and  variable  protein 
families  will  be  especially  essential  in  understanding  role  of  the  micro¬ 
biota  in  health  and  disease. 

Correlations  with  host  phenotype 

We  finally  examined  relationships  associating  both  clades  and 
metabolism  in  the  microbiota  with  host  properties  such  as  age, 
gender,  body  mass  index  (BMI),  and  other  available  clinical  metadata 
(Fig.  5  and  Supplementary  Table  3).  Using  a  sparse  multivariate 
model,  960  microbial,  enzymatic  or  pathway  abundances  were  sig¬ 
nificantly  associated  with  one  or  more  of  15  subject  phenotype  and 
sample  metadata  features.  A  wide  variety  of  taxa,  gene  families  and 
metabolic  pathways  were  differentially  distributed  with  subject 
ethnicity  at  every  body  habitat  (Fig.  5a),  representing  the  phenotype 
with  the  greatest  number  (266  at  false  discovery  rate  (FDR)  q  <  0.2)  of 
total  associations  with  the  microbiome.  Vaginal  pH  has  also  been 
observed  to  correlate  with  microbiome  composition9,  and  we  detected 
in  this  population  both  the  expected  reduction  in  Lactobacillus  at  high 
pH  and  a  corresponding  increase  in  metabolic  diversity  (Fig.  5b). 
Intriguingly,  and  not  previously  observed,  subject  age  was  most  asso¬ 
ciated  with  a  collection  of  highly  differential  metagenomically 
encoded  pathways  on  the  skin  (Fig.  5c),  as  well  as  shifts  in  skin  clades 
including  retroauricular  Firmicutes  (P  =  1.0  X  10  4,  q  =  0.033).  The 
examples  of  associations  with  ethnicity  and  vaginal  pH  are  among  the 
strongest  associations  with  the  microbiome,  however,  and  most  cor¬ 
relates  (for  example,  with  subject  BMI,  Fig.  5d)  are  more  representa¬ 
tively  modest.  This  lower  degree  of  correlation  held  for  most  available 
biometrics  (gender,  temperature,  blood  pressure,  etc.),  with  even  the 
most  significant  associations  possessing  generally  low  effect  sizes  and 
considerable  unexplained  variance.  We  conclude  that  most  variation 
in  the  human  microbiome  is  not  well  explained  by  these  phenotypic 
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Figure  5  |  Microbial  community  membership  and  function  correlates  with 
host  phenotype  and  sample  metadata,  a-d,  The  pathway  and  clade 
abundances  most  significantly  associated  (all  FDR  q  <  0.2)  using  a  multivariate 
linear  model  with  subject  race  or  ethnicity  (a),  vaginal  posterior  fornix  pH 
(b),  subject  age  (c)  and  BMI  (d).  Scatter  plots  of  samples  are  shown  with  lines 

metadata,  and  other  potentially  important  factors  such  as  short-  and 
long-term  diet,  daily  cycles,  founder  effects  such  as  mode  of  delivery, 
and  host  genetics  should  be  considered  in  future  analyses. 

Conclusions 

This  extensive  sampling  of  the  human  microbiome  across  many  sub¬ 
jects  and  body  habitats  provides  an  initial  characterization  of  the 
normal  microbiota  of  healthy  adults  in  a  Western  population.  The 
large  sample  size  and  consistent  sampling  of  many  sites  from  the  same 
individuals  allows  for  the  first  time  an  understanding  of  the  relationships 
among  microbes,  and  between  the  microbiome  and  clinical  parameters, 
that  underpin  the  basis  for  individual  variation — variation  that  may 
ultimately  be  critical  for  understanding  microbiome-based  disorders. 
Clinical  studies  of  the  microbiome  will  be  able  to  leverage  the  resulting 
extensive  catalogues  of  taxa,  pathways  and  genes1,  although  they  must 
also  still  include  carefully  matched  internal  controls.  The  uniqueness  of 
each  individual’s  microbiome  even  in  this  reference  population  argues 
for  future  studies  to  consider  prospective  within-subjects  designs  where 
possible.  The  HMP’s  unique  combination  of  organismal  and  functional 
data  across  body  habitats,  encompassing  both  16S  and  metagenomic 
profiling,  together  with  detailed  characterization  of  each  subject,  has 
allowed  us  and  subsequent  studies  to  move  beyond  the  observation  of 


indicating  best  simple  linear  fit.  Race/ethnicity  and  vaginal  pH  are  particularly 
strong  associations;  age  and  BMI  are  more  representative  of  typically  modest 
phenotypic  associations  (Supplementary  Table  3),  suggesting  that  variation  in 
the  healthy  microbiota  may  correspond  to  other  host  or  environmental  factors. 


variability  in  the  human  microbiome  to  ask  how  and  why  these  microbial 
communities  vary  so  extensively. 

Many  details  remain  for  further  work  to  fill  in,  building  on  this 
reference  study.  How  do  early  colonization  and  lifelong  change  vary 
among  body  habitats?  Do  epidemiological  patterns  of  transmission  of 
beneficial  or  harmless  microbes  mirror  patterns  of  transmission  of 
pathogens?  Which  co-occurrences  among  microbes  reflect  shared 
response  to  the  environment,  as  opposed  to  competitive  or  mutualistic 
interactions?  How  large  a  role  does  host  immunity  or  genetics  play  in 
shaping  patterns  of  diversity,  and  how  do  the  patterns  observed  in  this 
North  American  population  compare  to  those  around  the  world?  Future 
studies  building  on  the  gene  and  organism  catalogues  established  by  the 
Human  Microbiome  Project,  including  increasingly  detailed  investi¬ 
gations  of  metatranscriptomes  and  metaproteomes,  will  help  to  unravel 
these  open  questions  and  allow  us  to  more  fully  understand  the  links 
between  the  human  microbiome,  health  and  disease. 

METHODS  SUMMARY 

Microbiome  samples  were  collected  from  up  to  18  body  sites  at  one  or  two  time 
points  from  242  individuals  clinically  screened  for  absence  of  disease  (K.  Aagaard 
et  ah,  manuscript  submitted).  Samples  were  subjected  to  16S  ribosomal  RNA  gene 
pyrosequencing  (454  Life  Sciences),  and  a  subset  were  shotgun-sequenced  for 
metagenomics  using  the  Illumina  GAIIx  platform1.  16S  data  processing  and 
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diversity  estimates  were  performed  using  QIIME23,  and  metagenomic  data  were 
taxonomically  profiled  using  MetaPhlAn11,  metabolically  profiled  by  HUMAnN22, 
and  assembled  for  gene  annotation  and  clustering  into  a  unique  catalogue1. 
Potential  pathogens  were  identified  using  the  PATRIC  database12,  isolate  reference 
genome  annotations  drawn  from  KEGG24,  and  reference  genome  mapping  per¬ 
formed  by  BWA25  to  a  reduced  set  of  genomes  to  which  short  reads  could  be 
matched26.  Microbial  associations  were  assessed  by  similarity  measures  accounting 
for  compositionality21,  and  phenotypic  association  testing  was  performed  in  R.  All 
data  and  additional  protocol  details  are  available  at  http://hmpdacc.org.  Full 
methods  accompany  this  paper  in  the  Supplementary  Information. 
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